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DEDICATION 


This  paper  is  dedicated  to  the  memory  of  Sidney  Suslow, 
a founding  member  of  the  Association  of  Institutional  Research, 
and  a man  whose  constant  energy  went  into  the  support  of  its 
purposes  and  goals.  It  was  with  Sid's  support  and  encourage- 
ment that  we  pursued  our  interest  in  higher  educational 
planning,  and  his  pioneering  work  in  obtaining  longitudinal 
data  on  students  led  directly  to  our  work  in  the  study  of 
longitudinal  models. 
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0.  Introduction 


In  1968  Sidney  Suslow,  together  with  his  colleagues  in 
the  Office  of  Institutional  Research  at  the  Berkeley  Campus  of 
the  University  of  California,  completed  a study  (Suslow  et  al. 
[4])  of  undergraduate  student  attendance  patterns  over  time. 

That  report  contains  some  of  the  earliest  data  the  authors  had 
seen  on  a given  group,  or  cohort,  of  students,  and  how  the  group 
behaved  over  its  undergraduate  career.  Most  institutions  keep 
only  cross-sectional  data  obtained  from  enrollment  statistics. 


It  was  the  availability  of  the  Suslow  data  that  led  the  authors 


to  pursue  the  formulation  and  analysis  of  enrollment  models 
based  on  longitudinal  student  attendance  patterns.  The  authors 
presented  a constant-work  model  (Marshall  and  Oliver  [2])  which 
explained  the  data  quite  successfully.  They  also,  together 
with  Suslow  in  [3],  tried  to  find  cross-sectional  Markovian 
models  to  fit  the  longitudinal  data  (this  latter  work  ^.repro- 
duced in  a shortened  form  in  Chapter  2 of  Grinold  and  Marshal  l-*» 
[1],  which  is  perhaps  more  accessible  than  [3]). 

The  purpose  of  this  paper  is  to  demonstrate  how  the 
longitudinal  data  can  be  used  to  determine  variances,  and  hence 
confidence  bounds,  on  student  enrollment  forecasts  in  addition 
to  finding  the  forecasts  themselves.  Thus  with  each  forecast 
we  have  a measure  of  the  error  that  could  be  present. 
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1.  Model  Formulation 


We  consider  discrete  points  in  time  such  as  the  beginning 
of  a quarter,  semester,  or  academic  year.  The  particular  choice 
depends  on  the  model  use  and  the  availability  of  data.  In  our 
numerical  examples  we  use  the  data  from  Suslow  et  al.  [4],  and 
hence  our  time  points  coincide  with  semesters.  Thus  when  we  write 
t » 1,2,3,...,  we  mean  the  start  of  the  first,  second,  third,  etc. 
semesters  in  the  future;  t ■ 0 will  refer  to  the  point  "now"  from 
which  forecasts  are  being  made,  and  t ■ -1,  -2,  -3,  will  refer  to 
the  first,  second,  third,  etc.  semesters  in  the  past. 

Our  first  aim  is  to  derive  an  expression  for  the  expected 
number  in  attendance  at  some  time  t > 0.  We  do  not  differentiate 
groups  such  as  freshmen,  sophomores,  or  lower  division,  upper 
division.  This  could  easily  be  done  by  placing  subscripts  on  our 
notation,  but  we  choose  to  simplify  the  notation  to  be  consistent 
with  the  Suslow  data  on  total  student  attendance. 

Let  S ( t; u)  be  the  number  of  students  in  attendance  at 
time  t who  entered  (for  the  first  time)  at  time  t - u, 
u - 0,1,...  . Let  S ( t)  be  the  total  number  of  students  in 
attendance  at  time  t.  Then 

S(t)  - S(t;0)  + S(t;l)  + S(t;2)  + •••  S(t;u)  + •••  . (1) 

The  data  in  14]  showed  that  for  the  periods  studied 
(1950's  and  1960's)  there  was  very  stable  behavior  in  student 
attendance;  the  fraction  of  students  who  attended  a given  semester 
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after  entrance  was  independent  of  when  the  students  first  entered. 
However,  only  fall-entering  cohorts  were  studied.  We  assume  here 
that  stable  behavior  could  be  expected  from  spring-entering 
cohorts  also,  but  that  fall-  and  spring-entering  students  could 
have  different  continuation  fractions.  Let  p1<u)  be  the  prob- 
ability that  a student  attends  at  time  u after  entering  in  the 
fall,  independent  of  the  particular  entrance  time.  Let  P2<u) 
be  equivalent  probability  for  spring-entering  students.  We  also 
assume  that  the  attendance  of  any  given  student  is  independent 
of  the  attendance  or  non-attendance  of  any  other  student;  i.e. 
all  students  act  independently  of  each  other.  Table  1 gives 
p^(u)  determined  by  Suslow  et  al.  in  [4]. 

Let  N(t)  be  the  number  of  new  students  who  enter  at 
time  t.  The  above  two  assumptions  imply  that  the  v&iue  of  (the 
random  variable)  S(t;u),  given  the  value  of  N(t-u),  has  a. 
binomial  probability  distribution.  That  is, 

Pr [S (t;u)  ■ klN(t-u)  = m]  ■ (™)  (u) k [1  - pi (u) ] m k , (2) 


for  k ■ 0,1,..., m,  and  n 0,  where  i » 1 for  fall  students 
and  i * 2 for  spring  students.  In  particular  the  conditional 
expectation  and  the  conditional  variance  of  S(t;u)  are  given 
respectively  by 

E [S (t;u) IN (t-u)  » m)  * mp^ (u)  , (3) 


VartS(tju)  |N(t-u)  * m]  ■ mp^.  (u)  [1  - pi  (u)  ] . (4) 
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TABLE  1:  Sample  student  attendance  data  from  Suslow  et  al.  [41. 


Let  t be  the  start  of  a fall  semester.  After  taking 


l 


expectations  in  (1)  and  using  (3),  the  expected  total  enrollment 
at  time  t is 


S [S  ( t)  ] - l p.  , . (u)  E [N  ( t-u)  ] 
u-0 


(5) 


Here  we  have  let 


i{u)  * 1 if  u = 0,2,4, 6,  ... 

= 2 if  u = 1,3,5, 7,  ...  . 


For  any  two  random  variables  X and  Y the  expression 


Var [X]  = E [Var [ X I Y ] ] + Var[E[X|Y]] 


holds.  We  use  this  together  with  (1),  (3)  and  (4)  to  obtain  for 
the  variance  of  the  total  enrollment  at  time  t. 


“ / 

Var  [S  ( t)  ] = l ^ E [N  ( t-u)  ] (u)  d “ Pj_  (i)  ) 

+ pi  ( uj  ( vi)  2 Var(N(t-u))j 


(6) 


Equations  (5)  and  (6)  give  the  expected  enrollment  and 
its  variance  at  time  t.  Recall  that  t is  a fall  semester. 
For  the  case  when  t is  a spring  semester  we  use 


i(u)  =2  if  u = 0,2, 4, 6,  ... 
=1  if  u=l,3,5,7,  ... 


These  expressions  do  not  take  into  account  the  fact 
that  we  have  knowledge  of  enrollments  up  to  time  t = 0 (the 
current  time  in  our  timing  convention).  In  (5)  we  know  the 
values  of  N(0),  N(-l),  N(-2),  etc.  and  thus  our  forecast  for 
t > 0 becomes 

E[S(t)  |N  (0)  ,N(-1)  , ..  .] 

t-1  . <7) 

- l p. , . (u)  N(t-u)  + l p. . * (u)  E [N ( t-u) ] , 
u-t  llu'  u=0 

where  i(u)  is  defined  above  for  the  particular  case  that  t 
is  either  fall  or  spring.  The  first  summation  term  in  equation 
(7)  gives  the  expected  "legacy"  at  time  t of  the  given  inputs 
up  to  and  including  the  current  time  zero.  The  second  summation 
gives  the  expected  enrollment  at  time  t from  the  expected  input 
of  new  students  at  times  1,  2,  ...  , t. 

Similarly,  by  using  equation  (6) , the  'variance  of  the 
forecast  at  t,  given  inputs  up  to  and  including  time  zero, 
becomes 

Var  [S  ( t)  |N(0),N(-1)  ,...] 

oo 

= I Pi(u)  <u>  (1  - pi(u)  (u))  N(t-U) 

U*  t 

t-l  . J \ 

+ l ^Pi  (u)  (u)  d - Pi  (U)  (u)  ) E[N(t-u)  ] +Pi(u)  (u)z  Var(N(t-u))j 

(8) 

The  first  summation  gives  the  contribution  to  the  variance  from 
the  inputs  up  to  and  including  the  present.  The  second  summation 
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gives  the  contribution  which  will  occur  from  future  inputs.  Note 
that  this  depends  on  the  variance  of  the  new  inputs  for  times 
l,2,...,t  as  well  as  the  variance  due  to  returning  students. 

Table  1 gives  data  for  p^(u),  u _>  0,  obtained  originally 

in  the  study  for  Suslow  et  al.  [4],  and  reproduced  on  page  66  of 

[1] . The  third  and  fourth  columns  give  p^ (u) (1-p^ (u) ) and 
2 

p^(u)  respectively.  These  data  dre  required  in  equation  (8), 
whereas  the  data  in  column  2 are  required  in  equation  (7) . 

The  usual  interpretation  given  to  the  second  column  in 
Table  1 is  simply  the  fraction  of  attending  students  out  of  a 
given  cohort.  The  third  column  is  the  variance  of  the  S(t;u) 
terms  divided  by  N(t-u).  It  is  interesting  to  see  how  the 
conditional  expectation  and  the  variance  of  the  number  of  attend- 
ing students  vary  with  the  number  of  time  periods  that  have 
elapsed  since  initial  registration.  As  one  might  expect,  the 
fraction  of  students  out  of  a given  cohort  that  return  to  attend 
decreases  rapidly  and  there  is  a sharp  drop  of  attendance  after 
eight  semesters.  By  the  end  of  the  12th  semester  the  fraction 
of  attending  students  decreases  to  a number  less  than  4%  of 
the  original  cohort.  However,  the  conditional  variance  of  the 
number  returning  first  increases,  has  its  maximum  when  seven  or 
eight  semesters  have  elapsed  and  then  decreases  to  a negligible 
amount  by  the  end  of  the  12th  semester.  About  the  12th  semester, 
the  conditional  expectation  and  variance  of  the  number  attending 
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are  about  equal;  this  result  is  not  surprising,  if  we  recall 
that  the  Poisson  distribution  (whose  variance  and  mean  are  equal) 
is  a good  approximation  to  the  binomial  distribution  when  the 
probability  p(u)  is  small.  Thus,  students  returning  after 
10  periods  can  be  classified  as  "rare"  events  in  the  sense  that 
while  the  probability  that  an  individual  student  attends  is 
small  the  original  cohort  is  large  enough  so  that  the  probability 
distribution  of  returning  students  is  Poisson.  By  similar 
arguments  one  can  deduce  that  the  number  who  do  not  attend  in 
the  first  few  semesters  is  also  Poisson  distributed. 

Consider  a simple  system  where  there  is  no  variance  in 
the  new  student  input,  which  is  a fixed  amount,  say  n^,  in  each 
fall  semester,  and  a fixed  amount  n2  in  each  spring  semester. 

Thus  E (N  ( t)  ] = n^  and  Var[N(t)]  * 0 for  all  t where  i * 1 
for  a fall  semester  and  i » 2 in  the  spring.  Using  these  in 
(7)  and  (8),  and  assuming  p^(u)  =>  p2(u)  with  the  data  in  Table  1, 
we  obtain 

ElS(t)]  = 3.873nx  + 3.122n2  , Var(S(t)]  = O^SSnj^  + 0.937n2 
for  t a fall  semester,  and 

E[S(t)l  = 3.873n2  + 3.122nlf  Var[S(t)]  * 0.968n2  + 0.937n1 

for  t a spring  semester.  All  these  expressions  are  independent 
of  t because  of  the  constant  input  each  period. 
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Table  2 illustrates  the  use  of  these  equations  for  three 
combinations  of  fall  and  spring  input  totalling  4000  per  year, 
and  assuming  tp^(u)  • Pjtu))  are  given  in  Table  1. 


Semester 

Input 

Expected 

Enrollment 

Variance  of 
Enrollment 

Fall 

4,000 

15,348 

3,872 

Spring 

0 

12,488 

3,  748 

Fall 

3,000 

14,633 

3,841 

Spring 

1,000 

13,203 

3,  779 

Fall 

2,000 

13,918 

3,  810 

Spring 

2,000 

13,918 

3,810 

TABLE  2:  Illustrative  calculations  for  differing  fall/spring 
input  values. 


A fairly  typical  use  for  Equations  (7)  and  (8)  is  that 
of  forecasting  one  period  into  the  future.  With  the  convention 
that  t ■ 0 represents  today  (the  start  of  a fall  semester), 
we  obtain  the  next  period  forecast 
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oo 


(u)  N(l-u)  + E [N  ( 1)  ] , 


e [s ( l) In (0) , N(-i),...]  - l p. . . 

u»l 

with  i(u)  * 1 for  u even,  i(u)  * 2 for  u odd,  and 
provided  p^(0)  =»  1.  The  first  (summation)  terms  represents  the 
expected  number  of  returning  students  and  the  second  term  repre- 
sents the  expected  number  of  new  admissions.  The  corresponding 
expression  for  the  variance  of  enrollments  in  the  next  period  is 

oo 

Var[S(l)  IN  (0)  ,N(-1)  ,...]  =*  l Pi(u)  <«>  <1  “ Pi(u)  <u) ) N(l-u)  + Var[N(l)  ]. 

In  this  case  where  we  assume  all  entering  students  in  fact 
show  up,  the  fluctuations  are  due  either  to  the  uncertainty 
in  the  count  of  returning  students  already  enrolled  or  to  the 
uncertainty  in  the  new  students.  Thus  one  can  obtain  some  idea 
of  where  new  forecasting  efforts  should  be  directed.  In  certain 
institutions  the  dominant  problem  may  be  the  uncertainties 
associated  with  returning  students  rather  them  with  new  students. 

If,  for  example,  the  past  cohorts  were  approximately  3000  in 
each  fall  and  1000  in  each  spring,  but  the  next  group  of  enter- 
ing students  were  Poisson  with  expected  number  and  variance 
equal  to  1000  then  we  would  have  (from  Table  2) 

Var  [S[l] |N(0) ,N(— 1) ,...]  - 3779  + 1000  - 4779  . 
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In  this  case,  two  standard  deviations  (a  measure  of  error  often 
used  and  based  on  Normal  distribution  theory)  would  be  138  students 
which  is  slightly  larger  than  the  value  we  obtain  when  all 
admissions  are  constant  (2  x /3777  » 122  from  Table  2) . In 
other  words  it  is  possible  to  make  various  assumptions  about  the 
uncertainty  of  future  enrollments  and/or  returning  students  and 
easily  include  them  in  our  estimates  of  enrollment  fluctuations. 

It  is  unlikely  that  student  input  each  period  would  be 
constant.  In  the  next  section  we  analyze  the  model  assuming 
that  new  admissions  follow  a Poisson  distribution. 


2.  Poisson  Admissions 


The  number  of  new  students  who  actually  enroll  in  a 
given  future  semester  is  not  known  with  certainty.  A simple 
method  of  modelling  this  uncertainty  is  to  assume  the  number 
of  new  enrollments  follows  a Poisson  distribution.  Let  n(t) 
be  the  expected  number  of  new  enrollments  at  time  t.  Then 

r.  t + 

Pr  [N  (t)  - m]  - — |L , m > 0 . (9) 

m.  — 


From  equations  (2)  and  (9)  we  get 


Pr [S(t;u)»k] 


p . . . (u)  n . . . (1 

(u)  i (u) 


•,  k > 0 (10) 


This  shows  that  each  random  variable  in  (1)  has  a Poisson  dis-  * 
tribution,  which  together  with  our  independence  assumption,  . 
implies  that  £he  total  enrollment  at  time  t has  a Poisson  dis-  f 
tribution  at  every  time  t,  with  [ 


E [S  ( t)  ) -Var[S(t)]  - l p^^tu)  ni(u)(t-u)  * 

Using  our  previous  example,  but  with  Poisson  input 
instead  of  fixed  input,  with  n^  » 3000,  n2  » 1000  and 
p^(u)  » p2(u)  as  in  Table  1,  we  get  again  an  expected  enrollment 
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I 

of  14,741  each  fall  and  13,239  each  spring,  but  with  variances 
of  the  same  values.  Thus  two  standard  deviations  would  be  242 
each  fall  and  230  each  spring,  which  show  much  more  uncertainty 
in  the  forecasts  as  one  would  expect. 


* 
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3.  Large  Cohort  Sizes 

We  have  already  shown  in  equation  (2)  that  the  number 
of  students  attending  out  of  a given  entering  cohort  can  be  viewed 
as  the  result  of  summing  successes  in  Bernoulli  trials,  where 
the  probability  of  success  is  the  probability  that  a student 
attends  on  a given  semester.  Thus,  if  add  a finite  number  of 
such  random  variables  to  obtain  the  attendance  at  a later  time 
period  we  again  obtain  a sum  of  successes  in  a finite  number  of 
Bernoulli  trials.  If  the  parameter  p^(u)  of  the  Binomial  dis- 
tribution in  (2)  did  not  change  with  time,  then  it  would  also  be 
true  that  the  sum  in  (1)  is  binomially  distributed.  This  follows 
from  the  derivation  of  the  distribution  of  the  sum  of  successes 
in  a finite  number  of  Bernoulli  trials,  each  trial  having  the 
same  probability  of  success.  Unfortunately,  that  is  not  the 
case;  as  we  can  easily  see  from  Table  1 the  parameter  p^tu) 
changes  rather  dramatically  with  elapsed  time  since  entry  and 
the  resulting  distribution  is  obtained  from  the  convolution  of 
as  many  binomial  distributions,  with  changing  parameters,  as 
there  are  terms  in  (1) . Although  explicit  expressions  can  be 
found  for  the  generating  function  of  such  distributions,  alge- 
braic expressions  for  the  distribution  itself  are  not  simple. 
Fortunately,  however,  much  can  be  said  about  the  approximate 
behavior  of  the  conditional  distribution  of  S(t)  if  we  assume 
that  entering  cohorts  contain  large  numbers  of  students. 

The  central  limit  theorem  of  probability  theory  states 
that  if  S(t;u)  is  the  sum  of  the  number  of  successes  in  n(t-u) 
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trials  each  with  success  probability  p^u),  then  the  normalized 
sum 

* S(t;u)  - p.  (u)  n(t-u) 

S ( t;  U)  - = ypr  (11) 

tPi(u)  (1-Pi(uj)  n(t-u)]A/* 
is  approximately  normally  distributed.  If  we  write 

♦ (.,  - -i-  /*  e'^dy 
/2~  -«• 

for  the  normal  distribution  function,  then  with  large  cohort 
sizes,  i.e.,  large  numbers  entering  at  t-u. 


Pr[S  ( t ; u)  £ a]  ^ $(a)  independent  of 


Pi(u) 


and 


(12) 


As  long  as  each  entering  cohort  is  large  and  entering  cohorts 
act  independently  of  one  another  the  sum  of  a finite  number  of 
terms  in  (1)  is  also  approximately  normal.  In  this  case 


Pr[S  (t)  < a]  ~ * (a)  , 


(13) 


where  the  normalization  for  S (t)  is  given  by 


S*(t) 


S (t)  - l p. , . (u)  n (t-u) 
u > 0 1VU' 

(J0  "(t'U)  Pi(u)(U>ll'pi(u)lu)))1/'! 


(14) 
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Table  3 gives  E[S(t;u)]  for  u - 0,1,..., 12  and 
E[S(t)l  together  with  95%  confidence  intervals.  Also  tabulated 
is  the  length  of  the  confidence  intervals  as  a percentage  of 
the  expected  values.  Fall  and  spring  semesters  are  shown  in 
separate  columns  for  clarity  (again  t is  assumed  to  be  a fall 
semester) . Note  how  the  uncertainty  as  a percentage  of  the  mean 
increases  with  time  enrolled,  and  how  small  the  error  is  on 
the  total  enrolled  forecast  compared  to  the  individual  semesters. 

Equations  (13)  and  (14)  can  be  used  to  obtain  more 
information  on  the  uncertainty  in  S(t);  one  can  estimate  the 
probability  of  the  enrollment  exceeding  any  given  figure,  of 
not  exceeding  any  given  figure,  or  of  being  in  any  given  range. 
Let  a and  b be  any  two  numbers  with  a < b.  Then  for 
n^  * 3000,  n2  - 1000,  t a fall  semester,  and  the  data  given 
in  Table  1 with  p^(u)  » p2(u),  then 

Pr  [a  £ S(t)  < b]  ~ * ( b~~'  gj"6~3)  " * (*  ~ gj'  633  ) • 

From  tables  of  the  normal  distribution  we  see  that 


P[S(t) 

£ 

14,700] 

o 

• 

CO 

a\ 

P(S(t) 

£ 

14,500] 

00 

<T\ 

• 

O 

(15) 

P (14 , 500  £ S(t) 

£ 

14,700] 

00 

• 

o 

16 


Tine  u 


E[S(t;u)  ] 
95%  Confidence 

and 

Interval 

Fall 

Spring 

3000  + 0 

972  + 10 

2715  + 32 

756  + 27 

2052  + 51 

593  + 31 

1686  + 54 

524  + 32 

1494  + 55 

199  + 25 

390  + 37 

50  + 14 

108  + 20 

17+8 

"45+13 

11  + 7 

21+9 

14,633 

+ 124 

Confidence  Interval  as 
% of  E[S(t;u)  ] 


Fall 


0 


Spring 


■ 


19.0 


57.8 


85.7 


10.5 


12.2 


25.1 


56.0 


94.1 


127.3 


Total 


TABLE  3:  Forecasts  and  confidence  intervals  for  each  semester 

enrollment,  n^-  3000,  n2  * 1000,  and  t a fall  semester. 
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The  Normal  approximation  for  S(t)  still  holds  if  the 
admissions  each  semester  are  assumed  to  be  Poisson,  since  the 
total  enrollment  is  the  sum  of  independent  Poisson  random 
variables  with  distribution  given  by  (10)  . In  this  case  we 
consider 

S ( t)  - l p.  . . (u)  n(t-u) 

* u>0  1(u' 

S (t)  - . 

Uo  "(t-ul)1/J 

For  fall  Poisson  inputs  with  mean  3000,  spring  Poisson  inputs 
with  mean  1000,  t a fall  semester,  and  assuming  p^(u)  * p^u) 
given  in  Table  1,  then 

P[a  < S ( t)  < bl  ~ «(  ■?  ~T;j'633)  - »(a-  • 

In  this  case 

P[S(t)  _<  14,700]  ~ 0.71  , 

P[S(t)  > 14,500]  ~ 0.86  , (16) 

P [14 , 500  £ S(t)  £ 14,700]  ~ 0.57  . 

A comparison  of  (15)  and  (16)  shows  the  added  uncertainty  in 
the  forecast  due  to  randomness  in  the  numbers  of  admissions. 
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